Toronto Crime Predictions¶
Table of Contents¶
- Introduction
- Retrieve Data from API
- Preprocess the Data
- Create the Model
- Creating the Testing and Training Datasets - First Approach
- Testing Different Models - First Approach
- Optimizing the models - First Approach
- Creating the Testing and Training Datasets - Second Approach
- Testing Different Models - Second Approach
- Optimizing the models - Second Approach
- Auto Theft Models
- Total Count Models
- Apply Regression Chain Boosting Algorithm on RF Regressor
- Perform Hyper-Parameter Tuning on RF Regression Chain
- Apply ADA Boosting Algorithm on HBGB
- Perform Hyper-Parameter Tuning on ADA Boosted HBGB Regressor
- Perform Hyper-Parameter Tuning on RF Regressor
- Perform Hyper-Parameter Tuning on HBGB Regressor
- Create a Voting Ensemble Learning Model with the default RF and HBGB Models
- Results
- Visualizations based on current data
- Visualizations based on the Predictions
- Anticipated Crime Statistics for next six months of 2024
- Anticipated Total count of Crime Acitivities for upcoming three years
- Anticipated Crime Statistics for Upcoming Years with a Month Breakdown
- Anticipated Crime Statistics for 2025
- Anticipated Crime Statistics for 2026
- Anticipated Crime Statistics for 2027
- Summary and Conclusion
Introduction¶
Project Overview¶
This project focuses on analyzing patterns or trends in criminal acitivity data from 2021 to 2024 obtained from the Major Crime Indicators (MCI) dataset in Toronto. The objective is to make predictions on the future patterns of criminal activity based on the patterns present in the current data.
Tools & Technologies Used¶
Data Retrieval:
Requests (Python)for making API calls to retrieve the data for analysis and visualization purposes.
Data Manipulation:
Pandasfor majority of data cleaning, preprocessing, and manipulation steps.GeoPandasfor preprocessing and manipulating GeoJSON data.
Machine Learning:
Scikit-learnfor applying regression algorithms (e.g., Random Forest Regressor, Histogram-Based Gradient Boosting, etc.) and ensemble learning and optimization methods (e.g., Voting Regressor, ADA Boosting, Hyper-Parameter Tuning, etc.) to predict future crime trends.
Data Visualization:
Matplotlib and Seaborn (Python)for visualizing crime trends over time.Folium (Python)for visualizing geospatial data.
Data Sources¶
Toronto Police Service Major Crime Indicators (MCI) Data:
This dataset includes all Major Crime Indicators (MCI) occurrences by reported date and related offences since 2014. The Major Crime Indicators categories include Assault, Break and Enter, Auto Theft, Robbery and Theft Over (Excludes Sexual Violations).
This data is provided at the offence and/or victim level, therefore one occurrence number may have several rows of data associated to the various MCIs used to categorize the occurrence.
This data does not include occurrences that have been deemed unfounded. The definition of unfounded according to Statistics Canada is: “It has been determined through police investigation that the offence reported did not occur, nor was it attempted” (Statistics Canada, 2020).**
The location of crime occurrences have been deliberately offset to the nearest road intersection node to protect the privacy of parties involved in the occurrence...Due to the offset of occurrence location, the numbers by...Neighbourhood may not reflect the exact count of occurrences reported within these [neighbourhoods].
-- Toronto Police Services Public Safety Data Portal
- Further information on the dataset available here.
- The data is accessed by an ArcGIS REST API.
Neighbourhoods Data:
- City-of-Toronto-designated social planning neighbourhood boundaries defined primarily to help City staff collect data, plan, analyze and forecast City services.
- Further information available here.
Retrieved neighbourhood boundary coordinate data from GeoJSON file:
| geometry | _id | AREA_ID | AREA_ATTR_ID | PARENT_AREA_ID | AREA_SHORT_CODE | AREA_LONG_CODE | AREA_NAME | AREA_DESC | CLASSIFICATION | CLASSIFICATION_CODE | OBJECTID | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | MULTIPOLYGON (((-79.38635 43.69783, -79.38623 ... | 1 | 2502366 | 26022881 | 0 | 174 | 174 | South Eglinton-Davisville | South Eglinton-Davisville (174) | Not an NIA or Emerging Neighbourhood | NA | 17824737.0 |
| 1 | MULTIPOLYGON (((-79.39744 43.70693, -79.39837 ... | 2 | 2502365 | 26022880 | 0 | 173 | 173 | North Toronto | North Toronto (173) | Not an NIA or Emerging Neighbourhood | NA | 17824753.0 |
| 2 | MULTIPOLYGON (((-79.43411 43.66015, -79.43537 ... | 3 | 2502364 | 26022879 | 0 | 172 | 172 | Dovercourt Village | Dovercourt Village (172) | Not an NIA or Emerging Neighbourhood | NA | 17824769.0 |
| 3 | MULTIPOLYGON (((-79.4387 43.66766, -79.43841 4... | 4 | 2502363 | 26022878 | 0 | 171 | 171 | Junction-Wallace Emerson | Junction-Wallace Emerson (171) | Not an NIA or Emerging Neighbourhood | NA | 17824785.0 |
| 4 | MULTIPOLYGON (((-79.38404 43.64497, -79.38502 ... | 5 | 2502362 | 26022877 | 0 | 170 | 170 | Yonge-Bay Corridor | Yonge-Bay Corridor (170) | Not an NIA or Emerging Neighbourhood | NA | 17824801.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 153 | MULTIPOLYGON (((-79.59037 43.73401, -79.58942 ... | 154 | 2502213 | 26022728 | 0 | 001 | 001 | West Humber-Clairville | West Humber-Clairville (1) | Not an NIA or Emerging Neighbourhood | NA | 17827185.0 |
| 154 | MULTIPOLYGON (((-79.51915 43.77399, -79.51901 ... | 155 | 2502212 | 26022727 | 0 | 024 | 024 | Black Creek | Black Creek (24) | Neighbourhood Improvement Area | NIA | 17827201.0 |
| 155 | MULTIPOLYGON (((-79.53225 43.73505, -79.52938 ... | 156 | 2502211 | 26022726 | 0 | 023 | 023 | Pelmo Park-Humberlea | Pelmo Park-Humberlea (23) | Not an NIA or Emerging Neighbourhood | NA | 17827217.0 |
| 156 | MULTIPOLYGON (((-79.52813 43.74425, -79.52721 ... | 157 | 2502210 | 26022725 | 0 | 022 | 022 | Humbermede | Humbermede (22) | Neighbourhood Improvement Area | NIA | 17827233.0 |
| 157 | MULTIPOLYGON (((-79.53396 43.76886, -79.53227 ... | 158 | 2502209 | 26022724 | 0 | 021 | 021 | Humber Summit | Humber Summit (21) | Neighbourhood Improvement Area | NIA | 17827249.0 |
158 rows × 12 columns
Extracted the columns with neighbourhood names and boundary coordinates:
| geometry | NEIGHBOURHOOD_158 | |
|---|---|---|
| 0 | MULTIPOLYGON (((-79.38635 43.69783, -79.38623 ... | South Eglinton-Davisville (174) |
| 1 | MULTIPOLYGON (((-79.39744 43.70693, -79.39837 ... | North Toronto (173) |
| 2 | MULTIPOLYGON (((-79.43411 43.66015, -79.43537 ... | Dovercourt Village (172) |
| 3 | MULTIPOLYGON (((-79.4387 43.66766, -79.43841 4... | Junction-Wallace Emerson (171) |
| 4 | MULTIPOLYGON (((-79.38404 43.64497, -79.38502 ... | Yonge-Bay Corridor (170) |
| ... | ... | ... |
| 153 | MULTIPOLYGON (((-79.59037 43.73401, -79.58942 ... | West Humber-Clairville (1) |
| 154 | MULTIPOLYGON (((-79.51915 43.77399, -79.51901 ... | Black Creek (24) |
| 155 | MULTIPOLYGON (((-79.53225 43.73505, -79.52938 ... | Pelmo Park-Humberlea (23) |
| 156 | MULTIPOLYGON (((-79.52813 43.74425, -79.52721 ... | Humbermede (22) |
| 157 | MULTIPOLYGON (((-79.53396 43.76886, -79.53227 ... | Humber Summit (21) |
158 rows × 2 columns
Extracted the numeric code of the neighbourhoods and placed it in a seperate for labeling purposes:
| geometry | NEIGHBOURHOOD_158 | HOOD_158 | |
|---|---|---|---|
| 0 | MULTIPOLYGON (((-79.38635 43.69783, -79.38623 ... | South Eglinton-Davisville (174) | 174 |
| 1 | MULTIPOLYGON (((-79.39744 43.70693, -79.39837 ... | North Toronto (173) | 173 |
| 2 | MULTIPOLYGON (((-79.43411 43.66015, -79.43537 ... | Dovercourt Village (172) | 172 |
| 3 | MULTIPOLYGON (((-79.4387 43.66766, -79.43841 4... | Junction-Wallace Emerson (171) | 171 |
| 4 | MULTIPOLYGON (((-79.38404 43.64497, -79.38502 ... | Yonge-Bay Corridor (170) | 170 |
| ... | ... | ... | ... |
| 153 | MULTIPOLYGON (((-79.59037 43.73401, -79.58942 ... | West Humber-Clairville (1) | 1 |
| 154 | MULTIPOLYGON (((-79.51915 43.77399, -79.51901 ... | Black Creek (24) | 24 |
| 155 | MULTIPOLYGON (((-79.53225 43.73505, -79.52938 ... | Pelmo Park-Humberlea (23) | 23 |
| 156 | MULTIPOLYGON (((-79.52813 43.74425, -79.52721 ... | Humbermede (22) | 22 |
| 157 | MULTIPOLYGON (((-79.53396 43.76886, -79.53227 ... | Humber Summit (21) | 21 |
158 rows × 3 columns
Retrieve Data from API¶
Extracting crime data from API in batches of 2000 entries, which is the transfer limit, per API call.
Previewing JSON array structure:
[{'type': 'Feature',
'id': 246675,
'geometry': {'type': 'Point',
'coordinates': [-79.425761926, 43.6817690130001]},
'properties': {'OBJECTID': 246675,
'EVENT_UNIQUE_ID': 'GO-20213605',
'REPORT_DATE': 1609477200000,
'OCC_DATE': 1609477200000,
'REPORT_YEAR': 2021,
'REPORT_MONTH': 'January',
'REPORT_DAY': 1,
'REPORT_DOY': 1,
'REPORT_DOW': 'Friday ',
'REPORT_HOUR': 16,
'OCC_YEAR': 2021,
'OCC_MONTH': 'January',
'OCC_DAY': 1,
'OCC_DOY': 1,
'OCC_DOW': 'Friday ',
'OCC_HOUR': 16,
'DIVISION': 'D13',
'LOCATION_TYPE': 'Parking Lots (Apt., Commercial Or Non-Commercial)',
'PREMISES_TYPE': 'Outside',
'UCR_CODE': 2135,
'UCR_EXT': 210,
'OFFENCE': 'Theft Of Motor Vehicle',
'MCI_CATEGORY': 'Auto Theft',
'HOOD_158': '094',
'NEIGHBOURHOOD_158': 'Wychwood (94)',
'HOOD_140': '094',
'NEIGHBOURHOOD_140': 'Wychwood (94)',
'LONG_WGS84': -79.42576192637651,
'LAT_WGS84': 43.68176901263976}},
{'type': 'Feature',
'id': 246676,
'geometry': {'type': 'Point',
'coordinates': [5.6843418860808e-14, 5.08888749034163e-14]},
'properties': {'OBJECTID': 246676,
'EVENT_UNIQUE_ID': 'GO-20213400',
'REPORT_DATE': 1609477200000,
'OCC_DATE': 1609477200000,
'REPORT_YEAR': 2021,
'REPORT_MONTH': 'January',
'REPORT_DAY': 1,
'REPORT_DOY': 1,
'REPORT_DOW': 'Friday ',
'REPORT_HOUR': 16,
'OCC_YEAR': 2021,
'OCC_MONTH': 'January',
'OCC_DAY': 1,
'OCC_DOY': 1,
'OCC_DOW': 'Friday ',
'OCC_HOUR': 4,
'DIVISION': 'D33',
'LOCATION_TYPE': 'Other Commercial / Corporate Places (For Profit, Warehouse, Corp. Bldg',
'PREMISES_TYPE': 'Commercial',
'UCR_CODE': 2135,
'UCR_EXT': 210,
'OFFENCE': 'Theft Of Motor Vehicle',
'MCI_CATEGORY': 'Auto Theft',
'HOOD_158': 'NSA',
'NEIGHBOURHOOD_158': 'NSA',
'HOOD_140': 'NSA',
'NEIGHBOURHOOD_140': 'NSA',
'LONG_WGS84': 0,
'LAT_WGS84': 0}},
{'type': 'Feature',
'id': 246677,
'geometry': {'type': 'Point', 'coordinates': [-79.460110312, 43.721012854]},
'properties': {'OBJECTID': 246677,
'EVENT_UNIQUE_ID': 'GO-20211123',
'REPORT_DATE': 1609477200000,
'OCC_DATE': 1609477200000,
'REPORT_YEAR': 2021,
'REPORT_MONTH': 'January',
'REPORT_DAY': 1,
'REPORT_DOY': 1,
'REPORT_DOW': 'Friday ',
'REPORT_HOUR': 7,
'OCC_YEAR': 2021,
'OCC_MONTH': 'January',
'OCC_DAY': 1,
'OCC_DOY': 1,
'OCC_DOW': 'Friday ',
'OCC_HOUR': 4,
'DIVISION': 'D32',
'LOCATION_TYPE': "Other Non Commercial / Corporate Places (Non-Profit, Gov'T, Firehall)",
'PREMISES_TYPE': 'Other',
'UCR_CODE': 2135,
'UCR_EXT': 210,
'OFFENCE': 'Theft Of Motor Vehicle',
'MCI_CATEGORY': 'Auto Theft',
'HOOD_158': '031',
'NEIGHBOURHOOD_158': 'Yorkdale-Glen Park (31)',
'HOOD_140': '031',
'NEIGHBOURHOOD_140': 'Yorkdale-Glen Park (31)',
'LONG_WGS84': -79.46011031171706,
'LAT_WGS84': 43.72101285418029}}]
Filtering out metadata from JSON array, and extracting the crime data itself:
[{'OBJECTID': 246675,
'EVENT_UNIQUE_ID': 'GO-20213605',
'REPORT_DATE': 1609477200000,
'OCC_DATE': 1609477200000,
'REPORT_YEAR': 2021,
'REPORT_MONTH': 'January',
'REPORT_DAY': 1,
'REPORT_DOY': 1,
'REPORT_DOW': 'Friday ',
'REPORT_HOUR': 16,
'OCC_YEAR': 2021,
'OCC_MONTH': 'January',
'OCC_DAY': 1,
'OCC_DOY': 1,
'OCC_DOW': 'Friday ',
'OCC_HOUR': 16,
'DIVISION': 'D13',
'LOCATION_TYPE': 'Parking Lots (Apt., Commercial Or Non-Commercial)',
'PREMISES_TYPE': 'Outside',
'UCR_CODE': 2135,
'UCR_EXT': 210,
'OFFENCE': 'Theft Of Motor Vehicle',
'MCI_CATEGORY': 'Auto Theft',
'HOOD_158': '094',
'NEIGHBOURHOOD_158': 'Wychwood (94)',
'HOOD_140': '094',
'NEIGHBOURHOOD_140': 'Wychwood (94)',
'LONG_WGS84': -79.42576192637651,
'LAT_WGS84': 43.68176901263976},
{'OBJECTID': 246676,
'EVENT_UNIQUE_ID': 'GO-20213400',
'REPORT_DATE': 1609477200000,
'OCC_DATE': 1609477200000,
'REPORT_YEAR': 2021,
'REPORT_MONTH': 'January',
'REPORT_DAY': 1,
'REPORT_DOY': 1,
'REPORT_DOW': 'Friday ',
'REPORT_HOUR': 16,
'OCC_YEAR': 2021,
'OCC_MONTH': 'January',
'OCC_DAY': 1,
'OCC_DOY': 1,
'OCC_DOW': 'Friday ',
'OCC_HOUR': 4,
'DIVISION': 'D33',
'LOCATION_TYPE': 'Other Commercial / Corporate Places (For Profit, Warehouse, Corp. Bldg',
'PREMISES_TYPE': 'Commercial',
'UCR_CODE': 2135,
'UCR_EXT': 210,
'OFFENCE': 'Theft Of Motor Vehicle',
'MCI_CATEGORY': 'Auto Theft',
'HOOD_158': 'NSA',
'NEIGHBOURHOOD_158': 'NSA',
'HOOD_140': 'NSA',
'NEIGHBOURHOOD_140': 'NSA',
'LONG_WGS84': 0,
'LAT_WGS84': 0},
{'OBJECTID': 246677,
'EVENT_UNIQUE_ID': 'GO-20211123',
'REPORT_DATE': 1609477200000,
'OCC_DATE': 1609477200000,
'REPORT_YEAR': 2021,
'REPORT_MONTH': 'January',
'REPORT_DAY': 1,
'REPORT_DOY': 1,
'REPORT_DOW': 'Friday ',
'REPORT_HOUR': 7,
'OCC_YEAR': 2021,
'OCC_MONTH': 'January',
'OCC_DAY': 1,
'OCC_DOY': 1,
'OCC_DOW': 'Friday ',
'OCC_HOUR': 4,
'DIVISION': 'D32',
'LOCATION_TYPE': "Other Non Commercial / Corporate Places (Non-Profit, Gov'T, Firehall)",
'PREMISES_TYPE': 'Other',
'UCR_CODE': 2135,
'UCR_EXT': 210,
'OFFENCE': 'Theft Of Motor Vehicle',
'MCI_CATEGORY': 'Auto Theft',
'HOOD_158': '031',
'NEIGHBOURHOOD_158': 'Yorkdale-Glen Park (31)',
'HOOD_140': '031',
'NEIGHBOURHOOD_140': 'Yorkdale-Glen Park (31)',
'LONG_WGS84': -79.46011031171706,
'LAT_WGS84': 43.72101285418029}]
Converting JSON array into a Pandas DataFrame:
| OBJECTID | EVENT_UNIQUE_ID | REPORT_DATE | OCC_DATE | REPORT_YEAR | REPORT_MONTH | REPORT_DAY | REPORT_DOY | REPORT_DOW | REPORT_HOUR | ... | UCR_CODE | UCR_EXT | OFFENCE | MCI_CATEGORY | HOOD_158 | NEIGHBOURHOOD_158 | HOOD_140 | NEIGHBOURHOOD_140 | LONG_WGS84 | LAT_WGS84 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 246675 | GO-20213605 | 1609477200000 | 1609477200000 | 2021 | January | 1 | 1 | Friday | 16 | ... | 2135 | 210 | Theft Of Motor Vehicle | Auto Theft | 094 | Wychwood (94) | 094 | Wychwood (94) | -79.425762 | 43.681769 |
| 1 | 246676 | GO-20213400 | 1609477200000 | 1609477200000 | 2021 | January | 1 | 1 | Friday | 16 | ... | 2135 | 210 | Theft Of Motor Vehicle | Auto Theft | NSA | NSA | NSA | NSA | 0.000000 | 0.000000 |
| 2 | 246677 | GO-20211123 | 1609477200000 | 1609477200000 | 2021 | January | 1 | 1 | Friday | 7 | ... | 2135 | 210 | Theft Of Motor Vehicle | Auto Theft | 031 | Yorkdale-Glen Park (31) | 031 | Yorkdale-Glen Park (31) | -79.460110 | 43.721013 |
| 3 | 246678 | GO-2021445 | 1609477200000 | 1609477200000 | 2021 | January | 1 | 1 | Friday | 1 | ... | 2135 | 210 | Theft Of Motor Vehicle | Auto Theft | 151 | Yonge-Doris (151) | 051 | Willowdale East (51) | -79.415293 | 43.778743 |
| 4 | 246679 | GO-20213400 | 1609477200000 | 1609477200000 | 2021 | January | 1 | 1 | Friday | 16 | ... | 2135 | 210 | Theft Of Motor Vehicle | Auto Theft | NSA | NSA | NSA | NSA | 0.000000 | 0.000000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 147546 | 396731 | GO-20241427047 | 1719723600000 | 1719637200000 | 2024 | June | 30 | 182 | Sunday | 16 | ... | 1430 | 100 | Assault | Assault | 071 | Cabbagetown-South St.James Town (71) | 071 | Cabbagetown-South St.James Town (71) | -79.373043 | 43.663195 |
| 147547 | 396732 | GO-20241427869 | 1719723600000 | 1719723600000 | 2024 | June | 30 | 182 | Sunday | 18 | ... | 2133 | 200 | Theft Over - Shoplifting | Theft Over | 027 | York University Heights (27) | 027 | York University Heights (27) | -79.464942 | 43.759469 |
| 147548 | 396733 | GO-20241423116 | 1719723600000 | 1719637200000 | 2024 | June | 30 | 182 | Sunday | 2 | ... | 1450 | 120 | Discharge Firearm With Intent | Assault | 144 | Morningside Heights (144) | 131 | Rouge (131) | -79.248477 | 43.837237 |
| 147549 | 396734 | GO-20241426669 | 1719723600000 | 1718859600000 | 2024 | June | 30 | 182 | Sunday | 15 | ... | 2132 | 200 | Theft From Motor Vehicle Over | Theft Over | 160 | Mimico-Queensway (160) | 017 | Mimico (includes Humber Bay Shores) (17) | -79.521053 | 43.616490 |
| 147550 | 396735 | GO-20241425318 | 1719723600000 | 1719637200000 | 2024 | June | 30 | 182 | Sunday | 11 | ... | 1430 | 100 | Assault | Assault | 018 | New Toronto (18) | 018 | New Toronto (18) | -79.513940 | 43.598831 |
147551 rows × 29 columns
Preprocess the Data¶
Previewing all the columns(features) of the DataFrame:
Index(['OBJECTID', 'EVENT_UNIQUE_ID', 'REPORT_DATE', 'OCC_DATE', 'REPORT_YEAR',
'REPORT_MONTH', 'REPORT_DAY', 'REPORT_DOY', 'REPORT_DOW', 'REPORT_HOUR',
'OCC_YEAR', 'OCC_MONTH', 'OCC_DAY', 'OCC_DOY', 'OCC_DOW', 'OCC_HOUR',
'DIVISION', 'LOCATION_TYPE', 'PREMISES_TYPE', 'UCR_CODE', 'UCR_EXT',
'OFFENCE', 'MCI_CATEGORY', 'HOOD_158', 'NEIGHBOURHOOD_158', 'HOOD_140',
'NEIGHBOURHOOD_140', 'LONG_WGS84', 'LAT_WGS84'],
dtype='object')
Replacing the null NSA values in the Neighbourhood numerical code column with 0 for encoding purposes:
| OBJECTID | EVENT_UNIQUE_ID | REPORT_DATE | OCC_DATE | REPORT_YEAR | REPORT_MONTH | REPORT_DAY | REPORT_DOY | REPORT_DOW | REPORT_HOUR | ... | UCR_CODE | UCR_EXT | OFFENCE | MCI_CATEGORY | HOOD_158 | NEIGHBOURHOOD_158 | HOOD_140 | NEIGHBOURHOOD_140 | LONG_WGS84 | LAT_WGS84 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 246675 | GO-20213605 | 1609477200000 | 1609477200000 | 2021 | January | 1 | 1 | Friday | 16 | ... | 2135 | 210 | Theft Of Motor Vehicle | Auto Theft | 94 | Wychwood (94) | 094 | Wychwood (94) | -79.425762 | 43.681769 |
| 1 | 246676 | GO-20213400 | 1609477200000 | 1609477200000 | 2021 | January | 1 | 1 | Friday | 16 | ... | 2135 | 210 | Theft Of Motor Vehicle | Auto Theft | 0 | NSA | NSA | NSA | 0.000000 | 0.000000 |
| 2 | 246677 | GO-20211123 | 1609477200000 | 1609477200000 | 2021 | January | 1 | 1 | Friday | 7 | ... | 2135 | 210 | Theft Of Motor Vehicle | Auto Theft | 31 | Yorkdale-Glen Park (31) | 031 | Yorkdale-Glen Park (31) | -79.460110 | 43.721013 |
| 3 | 246678 | GO-2021445 | 1609477200000 | 1609477200000 | 2021 | January | 1 | 1 | Friday | 1 | ... | 2135 | 210 | Theft Of Motor Vehicle | Auto Theft | 151 | Yonge-Doris (151) | 051 | Willowdale East (51) | -79.415293 | 43.778743 |
| 4 | 246679 | GO-20213400 | 1609477200000 | 1609477200000 | 2021 | January | 1 | 1 | Friday | 16 | ... | 2135 | 210 | Theft Of Motor Vehicle | Auto Theft | 0 | NSA | NSA | NSA | 0.000000 | 0.000000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 147546 | 396731 | GO-20241427047 | 1719723600000 | 1719637200000 | 2024 | June | 30 | 182 | Sunday | 16 | ... | 1430 | 100 | Assault | Assault | 71 | Cabbagetown-South St.James Town (71) | 071 | Cabbagetown-South St.James Town (71) | -79.373043 | 43.663195 |
| 147547 | 396732 | GO-20241427869 | 1719723600000 | 1719723600000 | 2024 | June | 30 | 182 | Sunday | 18 | ... | 2133 | 200 | Theft Over - Shoplifting | Theft Over | 27 | York University Heights (27) | 027 | York University Heights (27) | -79.464942 | 43.759469 |
| 147548 | 396733 | GO-20241423116 | 1719723600000 | 1719637200000 | 2024 | June | 30 | 182 | Sunday | 2 | ... | 1450 | 120 | Discharge Firearm With Intent | Assault | 144 | Morningside Heights (144) | 131 | Rouge (131) | -79.248477 | 43.837237 |
| 147549 | 396734 | GO-20241426669 | 1719723600000 | 1718859600000 | 2024 | June | 30 | 182 | Sunday | 15 | ... | 2132 | 200 | Theft From Motor Vehicle Over | Theft Over | 160 | Mimico-Queensway (160) | 017 | Mimico (includes Humber Bay Shores) (17) | -79.521053 | 43.616490 |
| 147550 | 396735 | GO-20241425318 | 1719723600000 | 1719637200000 | 2024 | June | 30 | 182 | Sunday | 11 | ... | 1430 | 100 | Assault | Assault | 18 | New Toronto (18) | 018 | New Toronto (18) | -79.513940 | 43.598831 |
147551 rows × 29 columns
Previewing the occurence month column to verify that the month names are written in full/ not abbreviated:
| OCC_MONTH | |
|---|---|
| 0 | January |
| 1 | January |
| 2 | January |
| 3 | January |
| 4 | January |
| ... | ... |
| 147546 | June |
| 147547 | June |
| 147548 | June |
| 147549 | June |
| 147550 | June |
147551 rows × 1 columns
Encoding the month column converting the month names to their associated numbers:
| OCC_MONTH | |
|---|---|
| 0 | 1 |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| ... | ... |
| 147546 | 6 |
| 147547 | 6 |
| 147548 | 6 |
| 147549 | 6 |
| 147550 | 6 |
147551 rows × 1 columns
Collecting only the columns that are potential candidates for features in the machine learning model along with the target: MCI_Category.
| EVENT_UNIQUE_ID | NEIGHBOURHOOD_158 | HOOD_158 | LAT_WGS84 | LONG_WGS84 | PREMISES_TYPE | OCC_DATE | OCC_YEAR | OCC_MONTH | OCC_DAY | OCC_HOUR | MCI_CATEGORY | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | GO-20213605 | Wychwood (94) | 94 | 43.681769 | -79.425762 | Outside | 1609477200000 | 2021 | 1 | 1 | 16 | Auto Theft |
| 1 | GO-20213400 | NSA | 0 | 0.000000 | 0.000000 | Commercial | 1609477200000 | 2021 | 1 | 1 | 4 | Auto Theft |
| 2 | GO-20211123 | Yorkdale-Glen Park (31) | 31 | 43.721013 | -79.460110 | Other | 1609477200000 | 2021 | 1 | 1 | 4 | Auto Theft |
| 3 | GO-2021445 | Yonge-Doris (151) | 151 | 43.778743 | -79.415293 | Other | 1609477200000 | 2021 | 1 | 1 | 1 | Auto Theft |
| 4 | GO-20213400 | NSA | 0 | 0.000000 | 0.000000 | Commercial | 1609477200000 | 2021 | 1 | 1 | 4 | Auto Theft |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 147546 | GO-20241427047 | Cabbagetown-South St.James Town (71) | 71 | 43.663195 | -79.373043 | Apartment | 1719637200000 | 2024 | 6 | 29 | 23 | Assault |
| 147547 | GO-20241427869 | York University Heights (27) | 27 | 43.759469 | -79.464942 | Commercial | 1719723600000 | 2024 | 6 | 30 | 18 | Theft Over |
| 147548 | GO-20241423116 | Morningside Heights (144) | 144 | 43.837237 | -79.248477 | Outside | 1719637200000 | 2024 | 6 | 29 | 21 | Assault |
| 147549 | GO-20241426669 | Mimico-Queensway (160) | 160 | 43.616490 | -79.521053 | Outside | 1718859600000 | 2024 | 6 | 20 | 13 | Theft Over |
| 147550 | GO-20241425318 | New Toronto (18) | 18 | 43.598831 | -79.513940 | House | 1719637200000 | 2024 | 6 | 29 | 20 | Assault |
147551 rows × 12 columns
One-Hot Encoding Target Variable:
| Assault | Auto Theft | Break and Enter | Robbery | Theft Over | |
|---|---|---|---|---|---|
| 0 | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 0 | 0 | 0 |
| 2 | 0 | 1 | 0 | 0 | 0 |
| 3 | 0 | 1 | 0 | 0 | 0 |
| 4 | 0 | 1 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... |
| 147546 | 1 | 0 | 0 | 0 | 0 |
| 147547 | 0 | 0 | 0 | 0 | 1 |
| 147548 | 1 | 0 | 0 | 0 | 0 |
| 147549 | 0 | 0 | 0 | 0 | 1 |
| 147550 | 1 | 0 | 0 | 0 | 0 |
147551 rows × 5 columns
Dropping the unencoded target column:
| EVENT_UNIQUE_ID | NEIGHBOURHOOD_158 | HOOD_158 | LAT_WGS84 | LONG_WGS84 | PREMISES_TYPE | OCC_DATE | OCC_YEAR | OCC_MONTH | OCC_DAY | OCC_HOUR | Assault | Auto Theft | Break and Enter | Robbery | Theft Over | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | GO-20213605 | Wychwood (94) | 94 | 43.681769 | -79.425762 | Outside | 1609477200000 | 2021 | 1 | 1 | 16 | 0 | 1 | 0 | 0 | 0 |
| 1 | GO-20213400 | NSA | 0 | 0.000000 | 0.000000 | Commercial | 1609477200000 | 2021 | 1 | 1 | 4 | 0 | 1 | 0 | 0 | 0 |
| 2 | GO-20211123 | Yorkdale-Glen Park (31) | 31 | 43.721013 | -79.460110 | Other | 1609477200000 | 2021 | 1 | 1 | 4 | 0 | 1 | 0 | 0 | 0 |
| 3 | GO-2021445 | Yonge-Doris (151) | 151 | 43.778743 | -79.415293 | Other | 1609477200000 | 2021 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 |
| 4 | GO-20213400 | NSA | 0 | 0.000000 | 0.000000 | Commercial | 1609477200000 | 2021 | 1 | 1 | 4 | 0 | 1 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 147546 | GO-20241427047 | Cabbagetown-South St.James Town (71) | 71 | 43.663195 | -79.373043 | Apartment | 1719637200000 | 2024 | 6 | 29 | 23 | 1 | 0 | 0 | 0 | 0 |
| 147547 | GO-20241427869 | York University Heights (27) | 27 | 43.759469 | -79.464942 | Commercial | 1719723600000 | 2024 | 6 | 30 | 18 | 0 | 0 | 0 | 0 | 1 |
| 147548 | GO-20241423116 | Morningside Heights (144) | 144 | 43.837237 | -79.248477 | Outside | 1719637200000 | 2024 | 6 | 29 | 21 | 1 | 0 | 0 | 0 | 0 |
| 147549 | GO-20241426669 | Mimico-Queensway (160) | 160 | 43.616490 | -79.521053 | Outside | 1718859600000 | 2024 | 6 | 20 | 13 | 0 | 0 | 0 | 0 | 1 |
| 147550 | GO-20241425318 | New Toronto (18) | 18 | 43.598831 | -79.513940 | House | 1719637200000 | 2024 | 6 | 29 | 20 | 1 | 0 | 0 | 0 | 0 |
147551 rows × 16 columns
| NEIGHBOURHOOD_158 | HOOD_158 | LAT_WGS84 | LONG_WGS84 | PREMISES_TYPE | OCC_DATE | OCC_YEAR | OCC_MONTH | OCC_DAY | OCC_HOUR | |
|---|---|---|---|---|---|---|---|---|---|---|
| EVENT_UNIQUE_ID | ||||||||||
| GO-20211000033 | West Queen West (162) | 162 | 43.646286 | -79.408568 | Commercial | 1622264400000 | 2021 | 5 | 29 | 21 |
| GO-2021100004 | Morningside Heights (144) | 144 | 43.807252 | -79.162903 | Outside | 1610773200000 | 2021 | 1 | 16 | 17 |
| GO-20211000054 | Moss Park (73) | 73 | 43.657067 | -79.374531 | Apartment | 1622264400000 | 2021 | 5 | 29 | 22 |
| GO-20211000193 | Fort York-Liberty Village (163) | 163 | 43.636618 | -79.399704 | Apartment | 1622264400000 | 2021 | 5 | 29 | 23 |
| GO-20211000248 | Eglinton East (138) | 138 | 43.737099 | -79.246230 | Outside | 1622264400000 | 2021 | 5 | 29 | 21 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| GO-20249997 | Junction-Wallace Emerson (171) | 171 | 43.668917 | -79.442637 | Outside | 1704085200000 | 2024 | 1 | 1 | 18 |
| GO-202499972 | Edenbridge-Humber Valley (9) | 9 | 43.672705 | -79.522472 | House | 1705208400000 | 2024 | 1 | 14 | 3 |
| GO-2024999786 | Flemingdon Park (44) | 44 | 43.718727 | -79.334948 | Apartment | 1714539600000 | 2024 | 5 | 1 | 0 |
| GO-2024999795 | Oakridge (121) | 121 | 43.691225 | -79.288346 | Commercial | 1715230800000 | 2024 | 5 | 9 | 13 |
| GO-2024999882 | Eglinton East (138) | 138 | 43.738856 | -79.238421 | Commercial | 1715230800000 | 2024 | 5 | 9 | 14 |
129217 rows × 10 columns
| Assault | Auto Theft | Break and Enter | Robbery | Theft Over | |
|---|---|---|---|---|---|
| EVENT_UNIQUE_ID | |||||
| GO-20211000033 | 0 | 0 | 1 | 0 | 0 |
| GO-2021100004 | 0 | 1 | 0 | 0 | 0 |
| GO-20211000054 | 1 | 0 | 0 | 0 | 0 |
| GO-20211000193 | 1 | 0 | 0 | 0 | 0 |
| GO-20211000248 | 1 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... |
| GO-20249997 | 0 | 1 | 0 | 0 | 0 |
| GO-202499972 | 0 | 1 | 0 | 0 | 0 |
| GO-2024999786 | 1 | 0 | 0 | 0 | 0 |
| GO-2024999795 | 1 | 0 | 0 | 0 | 0 |
| GO-2024999882 | 1 | 0 | 0 | 0 | 0 |
129217 rows × 5 columns
| EVENT_UNIQUE_ID | NEIGHBOURHOOD_158 | HOOD_158 | LAT_WGS84 | LONG_WGS84 | PREMISES_TYPE | OCC_DATE | OCC_YEAR | OCC_MONTH | OCC_DAY | OCC_HOUR | Assault | Auto Theft | Break and Enter | Robbery | Theft Over | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | GO-20211000033 | West Queen West (162) | 162 | 43.646286 | -79.408568 | Commercial | 1622264400000 | 2021 | 5 | 29 | 21 | 0 | 0 | 1 | 0 | 0 |
| 1 | GO-2021100004 | Morningside Heights (144) | 144 | 43.807252 | -79.162903 | Outside | 1610773200000 | 2021 | 1 | 16 | 17 | 0 | 1 | 0 | 0 | 0 |
| 2 | GO-20211000054 | Moss Park (73) | 73 | 43.657067 | -79.374531 | Apartment | 1622264400000 | 2021 | 5 | 29 | 22 | 1 | 0 | 0 | 0 | 0 |
| 3 | GO-20211000193 | Fort York-Liberty Village (163) | 163 | 43.636618 | -79.399704 | Apartment | 1622264400000 | 2021 | 5 | 29 | 23 | 1 | 0 | 0 | 0 | 0 |
| 4 | GO-20211000248 | Eglinton East (138) | 138 | 43.737099 | -79.246230 | Outside | 1622264400000 | 2021 | 5 | 29 | 21 | 1 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 129212 | GO-20249997 | Junction-Wallace Emerson (171) | 171 | 43.668917 | -79.442637 | Outside | 1704085200000 | 2024 | 1 | 1 | 18 | 0 | 1 | 0 | 0 | 0 |
| 129213 | GO-202499972 | Edenbridge-Humber Valley (9) | 9 | 43.672705 | -79.522472 | House | 1705208400000 | 2024 | 1 | 14 | 3 | 0 | 1 | 0 | 0 | 0 |
| 129214 | GO-2024999786 | Flemingdon Park (44) | 44 | 43.718727 | -79.334948 | Apartment | 1714539600000 | 2024 | 5 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 129215 | GO-2024999795 | Oakridge (121) | 121 | 43.691225 | -79.288346 | Commercial | 1715230800000 | 2024 | 5 | 9 | 13 | 1 | 0 | 0 | 0 | 0 |
| 129216 | GO-2024999882 | Eglinton East (138) | 138 | 43.738856 | -79.238421 | Commercial | 1715230800000 | 2024 | 5 | 9 | 14 | 1 | 0 | 0 | 0 | 0 |
129217 rows × 16 columns
Create the Model¶
Creating the Testing and Training Datasets - First Approach¶
| EVENT_UNIQUE_ID | NEIGHBOURHOOD_158 | HOOD_158 | LAT_WGS84 | LONG_WGS84 | PREMISES_TYPE | OCC_DATE | OCC_YEAR | OCC_MONTH | OCC_DAY | OCC_HOUR | Assault | Auto Theft | Break and Enter | Robbery | Theft Over | Total_Count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | GO-20211000033 | West Queen West (162) | 162 | 43.646286 | -79.408568 | Commercial | 1622264400000 | 2021 | 5 | 29 | 21 | 0 | 0 | 1 | 0 | 0 | 1 |
| 1 | GO-2021100004 | Morningside Heights (144) | 144 | 43.807252 | -79.162903 | Outside | 1610773200000 | 2021 | 1 | 16 | 17 | 0 | 1 | 0 | 0 | 0 | 1 |
| 2 | GO-20211000054 | Moss Park (73) | 73 | 43.657067 | -79.374531 | Apartment | 1622264400000 | 2021 | 5 | 29 | 22 | 1 | 0 | 0 | 0 | 0 | 1 |
| 3 | GO-20211000193 | Fort York-Liberty Village (163) | 163 | 43.636618 | -79.399704 | Apartment | 1622264400000 | 2021 | 5 | 29 | 23 | 1 | 0 | 0 | 0 | 0 | 1 |
| 4 | GO-20211000248 | Eglinton East (138) | 138 | 43.737099 | -79.246230 | Outside | 1622264400000 | 2021 | 5 | 29 | 21 | 1 | 0 | 0 | 0 | 0 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 129212 | GO-20249997 | Junction-Wallace Emerson (171) | 171 | 43.668917 | -79.442637 | Outside | 1704085200000 | 2024 | 1 | 1 | 18 | 0 | 1 | 0 | 0 | 0 | 1 |
| 129213 | GO-202499972 | Edenbridge-Humber Valley (9) | 9 | 43.672705 | -79.522472 | House | 1705208400000 | 2024 | 1 | 14 | 3 | 0 | 1 | 0 | 0 | 0 | 1 |
| 129214 | GO-2024999786 | Flemingdon Park (44) | 44 | 43.718727 | -79.334948 | Apartment | 1714539600000 | 2024 | 5 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| 129215 | GO-2024999795 | Oakridge (121) | 121 | 43.691225 | -79.288346 | Commercial | 1715230800000 | 2024 | 5 | 9 | 13 | 1 | 0 | 0 | 0 | 0 | 1 |
| 129216 | GO-2024999882 | Eglinton East (138) | 138 | 43.738856 | -79.238421 | Commercial | 1715230800000 | 2024 | 5 | 9 | 14 | 1 | 0 | 0 | 0 | 0 | 1 |
129217 rows × 17 columns
| HOOD_158 | OCC_YEAR | OCC_MONTH | Total_Count | |
|---|---|---|---|---|
| 0 | 0 | 2021 | 1 | 44 |
| 1 | 0 | 2021 | 2 | 34 |
| 2 | 0 | 2021 | 3 | 45 |
| 3 | 0 | 2021 | 4 | 26 |
| 4 | 0 | 2021 | 5 | 37 |
| ... | ... | ... | ... | ... |
| 6672 | 174 | 2024 | 2 | 15 |
| 6673 | 174 | 2024 | 3 | 7 |
| 6674 | 174 | 2024 | 4 | 17 |
| 6675 | 174 | 2024 | 5 | 12 |
| 6676 | 174 | 2024 | 6 | 13 |
6677 rows × 4 columns
| HOOD_158 | OCC_YEAR | OCC_MONTH | Assault | Auto Theft | Break and Enter | Robbery | Theft Over | Total_Count | |
|---|---|---|---|---|---|---|---|---|---|
| 1895 | 49 | 2021 | 6 | 1 | 0 | 1 | 0 | 0 | 1 |
| 2229 | 58 | 2021 | 4 | 0 | 0 | 1 | 0 | 0 | 1 |
| 1909 | 49 | 2022 | 8 | 0 | 1 | 0 | 0 | 0 | 1 |
| 5208 | 140 | 2021 | 2 | 1 | 0 | 0 | 0 | 0 | 1 |
| 1890 | 49 | 2021 | 1 | 0 | 0 | 1 | 0 | 0 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 77 | 1 | 2023 | 12 | 30 | 45 | 28 | 4 | 13 | 118 |
| 68 | 1 | 2023 | 3 | 18 | 83 | 12 | 3 | 3 | 119 |
| 65 | 1 | 2022 | 12 | 24 | 67 | 15 | 4 | 10 | 120 |
| 71 | 1 | 2023 | 6 | 24 | 77 | 22 | 4 | 5 | 131 |
| 66 | 1 | 2023 | 1 | 18 | 91 | 14 | 3 | 7 | 133 |
6677 rows × 9 columns
| HOOD_158 | OCC_YEAR | OCC_MONTH | Assault | Auto Theft | Break and Enter | Robbery | Theft Over | Total_Count | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2021 | 1 | 23 | 2 | 10 | 9 | 1 | 44 |
| 1 | 0 | 2021 | 2 | 24 | 1 | 8 | 1 | 0 | 34 |
| 2 | 0 | 2021 | 3 | 27 | 7 | 5 | 5 | 3 | 45 |
| 3 | 0 | 2021 | 4 | 16 | 1 | 3 | 3 | 3 | 26 |
| 4 | 0 | 2021 | 5 | 30 | 3 | 3 | 1 | 1 | 37 |
| 5 | 0 | 2021 | 6 | 22 | 3 | 3 | 2 | 3 | 32 |
| 6 | 0 | 2021 | 7 | 29 | 2 | 3 | 6 | 2 | 42 |
| 7 | 0 | 2021 | 8 | 48 | 6 | 2 | 7 | 3 | 66 |
| 8 | 0 | 2021 | 9 | 27 | 8 | 6 | 3 | 3 | 47 |
| 9 | 0 | 2021 | 10 | 39 | 9 | 15 | 2 | 2 | 65 |
| 10 | 0 | 2021 | 11 | 26 | 10 | 6 | 4 | 4 | 50 |
| 11 | 0 | 2021 | 12 | 22 | 5 | 8 | 2 | 1 | 37 |
| 12 | 0 | 2022 | 1 | 32 | 7 | 6 | 3 | 2 | 50 |
| 13 | 0 | 2022 | 2 | 30 | 10 | 3 | 1 | 1 | 45 |
| 14 | 0 | 2022 | 3 | 34 | 1 | 6 | 7 | 3 | 50 |
| 15 | 0 | 2022 | 4 | 28 | 8 | 3 | 11 | 3 | 52 |
| 16 | 0 | 2022 | 5 | 27 | 13 | 6 | 2 | 10 | 56 |
| 17 | 0 | 2022 | 6 | 25 | 7 | 4 | 4 | 2 | 41 |
| 18 | 0 | 2022 | 7 | 27 | 10 | 6 | 6 | 2 | 51 |
| 19 | 0 | 2022 | 8 | 28 | 6 | 6 | 6 | 6 | 52 |
| 20 | 0 | 2022 | 9 | 36 | 19 | 7 | 12 | 1 | 74 |
| 21 | 0 | 2022 | 10 | 35 | 10 | 5 | 7 | 2 | 58 |
| 22 | 0 | 2022 | 11 | 37 | 12 | 5 | 6 | 3 | 62 |
| 23 | 0 | 2022 | 12 | 25 | 12 | 2 | 4 | 1 | 43 |
| 24 | 0 | 2023 | 1 | 24 | 9 | 2 | 4 | 4 | 43 |
| 25 | 0 | 2023 | 2 | 17 | 12 | 1 | 5 | 2 | 37 |
| 26 | 0 | 2023 | 3 | 20 | 14 | 1 | 1 | 1 | 37 |
| 27 | 0 | 2023 | 4 | 17 | 12 | 1 | 2 | 1 | 33 |
| 28 | 0 | 2023 | 5 | 22 | 6 | 0 | 4 | 1 | 32 |
| 29 | 0 | 2023 | 6 | 13 | 13 | 0 | 0 | 1 | 27 |
| 30 | 0 | 2023 | 7 | 27 | 12 | 1 | 3 | 0 | 41 |
| 31 | 0 | 2023 | 8 | 15 | 11 | 1 | 1 | 1 | 29 |
| 32 | 0 | 2023 | 9 | 15 | 16 | 1 | 2 | 5 | 39 |
| 33 | 0 | 2023 | 10 | 20 | 14 | 0 | 1 | 1 | 36 |
| 34 | 0 | 2023 | 11 | 25 | 10 | 3 | 2 | 1 | 40 |
| 35 | 0 | 2023 | 12 | 24 | 5 | 0 | 6 | 0 | 34 |
| 36 | 0 | 2024 | 1 | 21 | 4 | 2 | 4 | 2 | 33 |
| 37 | 0 | 2024 | 2 | 17 | 4 | 0 | 4 | 1 | 26 |
| 38 | 0 | 2024 | 3 | 19 | 7 | 4 | 1 | 3 | 33 |
| 39 | 0 | 2024 | 4 | 19 | 5 | 1 | 1 | 0 | 26 |
| 40 | 0 | 2024 | 5 | 17 | 12 | 4 | 2 | 2 | 36 |
| 41 | 0 | 2024 | 6 | 20 | 7 | 0 | 4 | 1 | 31 |
| HOOD_158 | OCC_YEAR | OCC_MONTH | Assault | Auto Theft | Break and Enter | Robbery | Theft Over | Total_Count | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2021 | 1 | 18 | 35 | 7 | 1 | 3 | 62 |
| 1 | 1 | 2021 | 2 | 17 | 17 | 5 | 1 | 3 | 43 |
| 2 | 1 | 2021 | 3 | 15 | 20 | 8 | 6 | 6 | 54 |
| 3 | 1 | 2021 | 4 | 11 | 31 | 4 | 2 | 4 | 52 |
| 4 | 1 | 2021 | 5 | 18 | 26 | 9 | 5 | 4 | 62 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6630 | 174 | 2024 | 2 | 9 | 0 | 5 | 1 | 1 | 15 |
| 6631 | 174 | 2024 | 3 | 6 | 1 | 0 | 0 | 0 | 7 |
| 6632 | 174 | 2024 | 4 | 12 | 2 | 2 | 0 | 1 | 17 |
| 6633 | 174 | 2024 | 5 | 8 | 1 | 2 | 0 | 1 | 12 |
| 6634 | 174 | 2024 | 6 | 6 | 4 | 1 | 1 | 1 | 13 |
6635 rows × 9 columns
Testing Different Models - First Approach¶
RandomForestRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
MultiOutputRegressor(estimator=HistGradientBoostingRegressor())In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
MultiOutputRegressor(estimator=HistGradientBoostingRegressor())
HistGradientBoostingRegressor()
HistGradientBoostingRegressor()
Lasso()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Lasso()
ExtraTreesRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ExtraTreesRegressor()
KNeighborsRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KNeighborsRegressor()
ElasticNet()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ElasticNet()
RadiusNeighborsRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RadiusNeighborsRegressor()
/usr/local/lib/python3.10/dist-packages/numpy/core/numeric.py:407: RuntimeWarning: invalid value encountered in cast multiarray.copyto(res, fill_value, casting='unsafe')
Random Forests Results Mean Squared Error: 16.345516578762318 R-squared: 0.42185755406596 mean absolute error: 2.507420886075949
Histogram-Based Gradient Boosting Results Mean Squared Error: 13.53697402119441 R-squared: 0.5401264890283979 mean absolute error: 2.3624557106149555
Lasso Regressor Results Mean Squared Error: 56.06504159690181 R-squared: 0.0008593089048466821 mean absolute error: 4.053649104634542
Extra-Trees Regressor Results Mean Squared Error: 21.009235548523208 R-squared: 0.2633688550325657 mean absolute error: 2.8370112517580868
K-Nearest Neighbors Regressor Results Mean Squared Error: 22.812032348804497 R-squared: 0.3917381351580209 mean absolute error: 2.9212025316455694
Elastic Net Regressor Results Mean Squared Error: 55.83116626511688 R-squared: 0.0018104521500861837 mean absolute error: 4.0478073837331445
Radius Neighbors Regressor Results Mean Squared Error: 8.973691110784241e+34 R-squared: -1.48166511902087e+34 mean absolute error: 9729295397526136.0
Optimizing the models - First Approach¶
Perform Hyper-Parameter Tuning on the Models¶
Hyper-Parameter Tuning was performed on the following models, since they yielded the highest R-squared scores:
- Random Forest Regressor
- Histogram-Based Gradient Boosting
- K-Nearest Neighbors Regressor
Iterations for Random Forest (RF) Regressor¶
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.928958393105038
R-squared: 0.49663590673300156
mean absolute error: 2.6323390583657864
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.928958393105038
R-squared: 0.49663590673300156
mean absolute error: 2.6323390583657864
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.900895599495165
R-squared: 0.49718000005206253
mean absolute error: 2.6397685170607654
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 10, 'min_samples_leaf': 4, 'max_depth': 5}
Mean Squared Error: 37.38108382013141
R-squared: 0.24787742552140726
mean absolute error: 3.5582910596792368
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.900895599495165
R-squared: 0.49718000005206253
mean absolute error: 2.6397685170607654
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.900895599495165
R-squared: 0.49718000005206253
mean absolute error: 2.6397685170607654
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.900895599495165
R-squared: 0.49718000005206253
mean absolute error: 2.6397685170607654
Best Hyperparameters: {'n_estimators': 300, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 18.358640232112
R-squared: 0.49228192721420205
mean absolute error: 2.66464858496764
Best Hyperparameters: {'n_estimators': 250, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 18.348343979326263
R-squared: 0.49257004515939246
mean absolute error: 2.6625904431614718
Best Hyperparameters: {'n_estimators': 450, 'min_samples_split': 15, 'min_samples_leaf': 8, 'max_depth': 10}
Mean Squared Error: 18.777789215683814
R-squared: 0.485768406447904
mean absolute error: 2.6973421196246625
Best Hyperparameters: {'n_estimators': 400, 'min_samples_split': 55, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 23.92497566088106
R-squared: 0.4239082506998882
mean absolute error: 2.9652493531921866
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.900895599495165
R-squared: 0.49718000005206253
mean absolute error: 2.6397685170607654
Iterations for Histogram-Based Gradient Boosting (HBGB)¶
Best Hyperparameters: {'estimator__max_depth': 7, 'estimator__learning_rate': 0.01, 'estimator__l2_regularization': 0.2}
Mean Squared Error: 32.18282924168508
R-squared: 0.3320401942652549
mean absolute error: 3.2649741520630413
Best Hyperparameters: {'estimator__min_samples_leaf': 40, 'estimator__max_depth': None, 'estimator__learning_rate': 0.01, 'estimator__l2_regularization': 0.0}
Mean Squared Error: 34.768348922898916
R-squared: 0.28056428760433166
mean absolute error: 3.322235227775795
Best Hyperparameters: {'estimator__min_samples_leaf': 10, 'estimator__max_depth': None, 'estimator__learning_rate': 0.01, 'estimator__l2_regularization': 0.1}
Mean Squared Error: 27.707811341396553
R-squared: 0.39456052080811127
mean absolute error: 3.123993487397762
Iterations for K-Nearest Neighbors (KNN) Regressor¶
Best Hyperparameters: {'weights': 'distance', 'p': 1, 'n_neighbors': 9}
Mean Squared Error: 18.85456904070352
R-squared: 0.46409840345829956
mean absolute error: 2.643735813566883
/usr/local/lib/python3.10/dist-packages/sklearn/model_selection/_search.py:320: UserWarning: The total space of parameters 16 is smaller than n_iter=20. Running 16 iterations. For exhaustive searches, use GridSearchCV. warnings.warn(
Best Hyperparameters: {'weights': 'distance', 'p': 1, 'n_neighbors': 9}
Mean Squared Error: 18.85456904070352
R-squared: 0.46409840345829956
mean absolute error: 2.643735813566883
Best Hyperparameters: {'weights': 'distance', 'p': 1, 'n_neighbors': 11}
Mean Squared Error: 19.612298984089254
R-squared: 0.46018526487952816
mean absolute error: 2.677782519371975
Apply Boosting Algorithms¶
MultiOutputRegressor(estimator=AdaBoostRegressor(estimator=HistGradientBoostingRegressor(random_state=1),
random_state=1))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
MultiOutputRegressor(estimator=AdaBoostRegressor(estimator=HistGradientBoostingRegressor(random_state=1),
random_state=1))AdaBoostRegressor(estimator=HistGradientBoostingRegressor(random_state=1),
random_state=1)HistGradientBoostingRegressor(random_state=1)
HistGradientBoostingRegressor(random_state=1)
AdaBoostRegressor Mean Squared Error: 15.514865328785016 AdaBoostRegressor R-squared: 0.4603984456235 AdaBoostRegressor Mean Absolute Error: 2.5524177817460036
RegressorChain(base_estimator=RandomForestRegressor(max_depth=10,
min_samples_leaf=2,
min_samples_split=15,
random_state=1))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RegressorChain(base_estimator=RandomForestRegressor(max_depth=10,
min_samples_leaf=2,
min_samples_split=15,
random_state=1))RandomForestRegressor(max_depth=10, min_samples_leaf=2, min_samples_split=15,
random_state=1)RandomForestRegressor(max_depth=10, min_samples_leaf=2, min_samples_split=15,
random_state=1)Regression Chain Model Mean Squared Error: 14.907590528079728 Regression Chain Model R-squared: 0.5211560076522394 Regression Chain Model Mean Absolute Error: 2.4530789200162197
Creating the Testing and Training Datasets - Second Approach¶
Testing Different Models - Second Approach¶
Predicting Total Counts¶
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel(). return fit_method(estimator, *args, **kwargs)
RandomForestRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
HistGradientBoostingRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
HistGradientBoostingRegressor()
Lasso()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Lasso()
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel(). return fit_method(estimator, *args, **kwargs)
ExtraTreesRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ExtraTreesRegressor()
KNeighborsRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KNeighborsRegressor()
ElasticNet()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ElasticNet()
RadiusNeighborsRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RadiusNeighborsRegressor()
/usr/local/lib/python3.10/dist-packages/numpy/core/numeric.py:407: RuntimeWarning: invalid value encountered in cast multiarray.copyto(res, fill_value, casting='unsafe')
Random Forest Results Mean Squared Error: 47.507770147679324 R-squared: 0.7805812475805141 mean absolute error: 5.139229957805908
Histogram-Based Gradient Boosting Results Mean Squared Error: 42.3805269624916 R-squared: 0.8042618644469353 mean absolute error: 5.0338833270972545
Lasso Results Mean Squared Error: 215.0452817500199 R-squared: 0.006794735079959646 mean absolute error: 10.677136243599007
Extra-Trees Results Mean Squared Error: 61.829009704641344 R-squared: 0.7144373619188399 mean absolute error: 5.8675316455696205
K-Nearest Neighbors Results Mean Squared Error: 78.54392405063292 R-squared: 0.6372380818601189 mean absolute error: 6.74535864978903
Elastic Net Results Mean Squared Error: 214.33255721979504 R-squared: 0.010086515072044833 mean absolute error: 10.465525613654595
Radius Neighbors Results Mean Squared Error: 8.973691110784241e+34 R-squared: -4.144576986049706e+32 mean absolute error: 9729295397526140.0
Predicting Assault Counts¶
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel(). return fit_method(estimator, *args, **kwargs)
RandomForestRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
HistGradientBoostingRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
HistGradientBoostingRegressor()
Random Forests Results Mean Squared Error: 18.884251160337556 R-squared: 0.7621670878841997 mean absolute error: 3.122795358649789
Histogram-Based Gradient Boosting Results Mean Squared Error: 16.26862998275755 R-squared: 0.7951088654730386 mean absolute error: 3.003620298129256
Predicting Auto Theft Counts¶
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel(). return fit_method(estimator, *args, **kwargs)
RandomForestRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
HistGradientBoostingRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
HistGradientBoostingRegressor()
Random Forests Results Mean Squared Error: 18.344488924050633 R-squared: 0.31502817625221924 mean absolute error: 2.793428270042194
Histogram-Based Gradient Boosting Results Mean Squared Error: 12.029894217961466 R-squared: 0.5508112209565743 mean absolute error: 2.4878698854824157
Predicting Break and Enter Counts¶
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel(). return fit_method(estimator, *args, **kwargs)
RandomForestRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
HistGradientBoostingRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
HistGradientBoostingRegressor()
Random Forests Results Mean Squared Error: 8.362676476793249 R-squared: 0.28879841796872596 mean absolute error: 2.052943037974684
Histogram-Based Gradient Boosting Results Mean Squared Error: 7.185335567450399 R-squared: 0.38892506039455554 mean absolute error: 1.9032188779372148
Predicting Robbery Counts¶
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel(). return fit_method(estimator, *args, **kwargs)
RandomForestRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
HistGradientBoostingRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
HistGradientBoostingRegressor()
Random Forests Results Mean Squared Error: 2.590130696202532 R-squared: 0.21079584323330303 mean absolute error: 1.1071835443037976
Histogram-Based Gradient Boosting Results Mean Squared Error: 2.218185363946034 R-squared: 0.3241263414730897 mean absolute error: 0.9848720183410007
Predicting Theft Over Counts¶
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel(). return fit_method(estimator, *args, **kwargs)
RandomForestRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
HistGradientBoostingRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
HistGradientBoostingRegressor()
Random Forests Results Mean Squared Error: 1.4372389240506327 R-squared: 0.21472270183775177 mean absolute error: 0.8258755274261604
Histogram-Based Gradient Boosting Results Mean Squared Error: 1.139272032559403 R-squared: 0.3775255814261935 mean absolute error: 0.7612698567025898
Optimizing the models - Second Approach¶
Auto Theft Models¶
Apply Regression Chain Boosting Algorithm on RF Regressor¶
Regression Chain Random Forest Regressor (Auto Theft) Mean Squared Error: 18.3886164556962 Regression Chain Random Forest Regressor (Auto Theft) R-squared: 0.31338048162557164 Regression Chain Random Forest Regressor (Auto Theft) Mean Absolute Error: 2.798417721518987
Perform Hyper-Parameter Tuning on RF Regression Chain¶
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters: {'base_estimator__n_estimators': 50, 'base_estimator__min_samples_split': 10, 'base_estimator__min_samples_leaf': 1, 'base_estimator__max_depth': 5}
Regression Chain Random Forest Regressor (Auto Theft) Mean Squared Error: 15.601535081978382
Regression Chain Random Forest Regressor (Auto Theft) R-squared: 0.4174483692289197
Regression Chain Random Forest Regressor (Auto Theft) Mean Absolute Error: 2.924723449937228
Apply ADA Boosting Algorithm on HBGB¶
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
AdaBoost Mean Squared Error: 18.659485596532548 AdaBoost R-squared: 0.30326639612753314 AdaBoost mean absolute error: 2.8809512187634048
Perform Hyper-Parameter Tuning on ADA Boosted HBGB Regressor¶
Attempts were made to perform hyper-parameter tuning on ADA Boosted HBGB Regressor, but it took way too much time to run them and they did not yield siginficant R-Squared scores.
Total Count Models¶
Apply Regression Chain Boosting Algorithm on RF Regressor¶
Regression Chain Random Forest Regressor (Total Count) Mean Squared Error: 48.1800003164557 Regression Chain Random Forest Regressor (Total Count) R-squared: 0.7774764943051415 Regression Chain Random Forest Regressor (Total Count) Mean Absolute Error: 5.157500000000001
Perform Hyper-Parameter Tuning on RF Regression Chain¶
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters: {'base_estimator__n_estimators': 100, 'base_estimator__min_samples_split': 10, 'base_estimator__min_samples_leaf': 1, 'base_estimator__max_depth': 10}
Regression Chain Random Forest Regressor (Total Count) Mean Squared Error: 66.41763382339101
Regression Chain Random Forest Regressor (Total Count) R-squared: 0.6932444038758028
Regression Chain Random Forest Regressor (Total Count) Mean Absolute Error: 6.172144837821778
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters: {'base_estimator__n_estimators': 100, 'base_estimator__min_samples_split': 10, 'base_estimator__min_samples_leaf': 1, 'base_estimator__max_depth': 10}
Regression Chain Random Forest Regressor (Total Count) Mean Squared Error: 68.88399843268999
Regression Chain Random Forest Regressor (Total Count) R-squared: 0.6818532852461192
Regression Chain Random Forest Regressor (Total Count) Mean Absolute Error: 6.253101443805737
Fitting 5 folds for each of 81 candidates, totalling 405 fits
Best Parameters: {'base_estimator__n_estimators': 150, 'base_estimator__min_samples_split': 5, 'base_estimator__min_samples_leaf': 1, 'base_estimator__max_depth': 10}
Regression Chain Random Forest Regressor (Total Count) Mean Squared Error: 71.02147143653406
Regression Chain Random Forest Regressor (Total Count) R-squared: 0.6719811809908387
Regression Chain Random Forest Regressor (Total Count) Mean Absolute Error: 6.345448931345114
Fitting 5 folds for each of 81 candidates, totalling 405 fits
Best Parameters: {'base_estimator__n_estimators': 100, 'base_estimator__min_samples_split': 2, 'base_estimator__min_samples_leaf': 1, 'base_estimator__max_depth': 10}
Regression Chain Random Forest Regressor (Total Count) Mean Squared Error: 69.62343491331661
Regression Chain Random Forest Regressor (Total Count) R-squared: 0.6784381337968257
Regression Chain Random Forest Regressor (Total Count) Mean Absolute Error: 6.2914368593362715
Apply ADA Boosting Algorithm on HBGB¶
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
AdaBoost Mean Squared Error: 45.44923474580936 AdaBoost R-squared: 0.7900887716820576 AdaBoost mean absolute error: 5.161318291385233
Perform Hyper-Parameter Tuning on ADA Boosted HBGB Regressor¶
Attempts were made to perform hyper-parameter tuning on ADA Boosted HBGB Regressor, but it took way too much time to run them and they did not yield siginficant R-Squared scores.
Perform Hyper-Parameter Tuning on RF Regressor¶
Fitting 5 folds for each of 20 candidates, totalling 100 fits
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel(). return fit_method(estimator, *args, **kwargs)
Best Parameters: {'n_estimators': 150, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_depth': 10}
Random Forest Regressor (Total Count) Mean Squared Error: 71.17906332274143
Random Forest Regressor (Total Count) R-squared: 0.6712533292108969
Random Forest Regressor (Total Count) Mean Absolute Error: 6.347966311436887
Fitting 5 folds for each of 20 candidates, totalling 100 fits
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel(). return fit_method(estimator, *args, **kwargs)
Best Parameters: {'n_estimators': 50, 'min_samples_leaf': 2, 'max_depth': 10}
Random Forest Regressor (Total Count) Mean Squared Error: 68.1934671493194
Random Forest Regressor (Total Count) R-squared: 0.6850425638048225
Random Forest Regressor (Total Count) Mean Absolute Error: 6.233294034033055
Fitting 5 folds for each of 20 candidates, totalling 100 fits
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel(). return fit_method(estimator, *args, **kwargs)
Best Parameters: {'n_estimators': 50, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Random Forest Regressor (Total Count) Mean Squared Error: 70.81104307153397
Random Forest Regressor (Total Count) R-squared: 0.6729530626257474
Random Forest Regressor (Total Count) Mean Absolute Error: 6.383808010790825
Perform Hyper-Parameter Tuning on HBGB Regressor¶
Fitting 5 folds for each of 20 candidates, totalling 100 fits
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
Best Parameters: {'min_samples_leaf': 10, 'max_iter': 300, 'max_depth': 10, 'learning_rate': 0.01, 'l2_regularization': 0.0}
HistGradientBoostingRegressor (Total Count) Mean Squared Error: 68.39689337906064
HistGradientBoostingRegressor (Total Count) R-squared: 0.68410302213826
HistGradientBoostingRegressor (Total Count) Mean Absolute Error: 6.424647018374788
Create a Voting Ensemble Learning Model with the default RF and HBGB Models¶
Voting Regressor with fitted RF and HBGB
/usr/local/lib/python3.10/dist-packages/sklearn/ensemble/_voting.py:694: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
Voting Regressor (Total Count) Mean Squared Error: 41.42631600357465 Voting Regressor (Total Count) R-squared: 0.8086689704319083 Voting Regressor (Total Count) Mean Absolute Error: 4.853770088554295
Voting Regressor with unfitted RF and HBGB
/usr/local/lib/python3.10/dist-packages/sklearn/ensemble/_voting.py:694: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
Voting Regressor (Total Count) Mean Squared Error: 41.182782954814 Voting Regressor (Total Count) R-squared: 0.8097937489168987 Voting Regressor (Total Count) Mean Absolute Error: 4.8440160089608995
Final Model¶
Hyper-Parameter Tuned Voting Regressor with unfitted RF and HBGB Regressors
Fitting 5 folds for each of 3 candidates, totalling 15 fits
/usr/local/lib/python3.10/dist-packages/sklearn/ensemble/_voting.py:694: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
Best Parameters for Voting Regressor: {'n_jobs': -1, 'weights': [1, 2]}
Voting Regressor (Total Count) Mean Squared Error: 40.66584193188937
Voting Regressor (Total Count) R-squared: 0.8121812858181674
Voting Regressor (Total Count) Mean Absolute Error: 4.862563006179832
| NEIGHBOURHOOD_158 | HOOD_158 | |
|---|---|---|
| 0 | West Queen West (162) | 162 |
| 1 | Morningside Heights (144) | 144 |
| 2 | Moss Park (73) | 73 |
| 3 | Fort York-Liberty Village (163) | 163 |
| 4 | Eglinton East (138) | 138 |
| ... | ... | ... |
| 1091 | Broadview North (57) | 57 |
| 1337 | Guildwood (140) | 140 |
| 1369 | Lambton Baby Point (114) | 114 |
| 1412 | Bayview Woods-Steeles (49) | 49 |
| 1778 | Woodbine-Lumsden (60) | 60 |
159 rows × 2 columns
| HOOD_158 | OCC_YEAR | OCC_MONTH | Assault | Auto Theft | Break and Enter | Robbery | Theft Over | Total_Count | NEIGHBOURHOOD_158 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2021 | 1 | 18 | 35 | 7 | 1 | 3 | 62 | West Humber-Clairville (1) |
| 1 | 1 | 2021 | 2 | 17 | 17 | 5 | 1 | 3 | 43 | West Humber-Clairville (1) |
| 2 | 1 | 2021 | 3 | 15 | 20 | 8 | 6 | 6 | 54 | West Humber-Clairville (1) |
| 3 | 1 | 2021 | 4 | 11 | 31 | 4 | 2 | 4 | 52 | West Humber-Clairville (1) |
| 4 | 1 | 2021 | 5 | 18 | 26 | 9 | 5 | 4 | 62 | West Humber-Clairville (1) |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6630 | 174 | 2024 | 2 | 9 | 0 | 5 | 1 | 1 | 15 | South Eglinton-Davisville (174) |
| 6631 | 174 | 2024 | 3 | 6 | 1 | 0 | 0 | 0 | 7 | South Eglinton-Davisville (174) |
| 6632 | 174 | 2024 | 4 | 12 | 2 | 2 | 0 | 1 | 17 | South Eglinton-Davisville (174) |
| 6633 | 174 | 2024 | 5 | 8 | 1 | 2 | 0 | 1 | 12 | South Eglinton-Davisville (174) |
| 6634 | 174 | 2024 | 6 | 6 | 4 | 1 | 1 | 1 | 13 | South Eglinton-Davisville (174) |
6635 rows × 10 columns
| NEIGHBOURHOOD_158 | HOOD_158 | OCC_YEAR | OCC_MONTH | Assault | Auto Theft | Break and Enter | Robbery | Theft Over | Total_Count | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | West Humber-Clairville (1) | 1 | 2021 | 1 | 18 | 35 | 7 | 1 | 3 | 62 |
| 1 | West Humber-Clairville (1) | 1 | 2021 | 2 | 17 | 17 | 5 | 1 | 3 | 43 |
| 2 | West Humber-Clairville (1) | 1 | 2021 | 3 | 15 | 20 | 8 | 6 | 6 | 54 |
| 3 | West Humber-Clairville (1) | 1 | 2021 | 4 | 11 | 31 | 4 | 2 | 4 | 52 |
| 4 | West Humber-Clairville (1) | 1 | 2021 | 5 | 18 | 26 | 9 | 5 | 4 | 62 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6630 | South Eglinton-Davisville (174) | 174 | 2024 | 2 | 9 | 0 | 5 | 1 | 1 | 15 |
| 6631 | South Eglinton-Davisville (174) | 174 | 2024 | 3 | 6 | 1 | 0 | 0 | 0 | 7 |
| 6632 | South Eglinton-Davisville (174) | 174 | 2024 | 4 | 12 | 2 | 2 | 0 | 1 | 17 |
| 6633 | South Eglinton-Davisville (174) | 174 | 2024 | 5 | 8 | 1 | 2 | 0 | 1 | 12 |
| 6634 | South Eglinton-Davisville (174) | 174 | 2024 | 6 | 6 | 4 | 1 | 1 | 1 | 13 |
6635 rows × 10 columns
Results¶
Visualizations based on current data¶
Total count of Criminal Incidents for past three years¶
Crime Statistics for Previous Years with a Month Breakdown¶
Crime Statistics for 2021¶
| NEIGHBOURHOOD_158 | OCC_YEAR | Total_Count | |
|---|---|---|---|
| 139 | West Humber-Clairville (1) | 2021 | 849 |
| 93 | Moss Park (73) | 2021 | 778 |
| 36 | Downtown Yonge East (168) | 2021 | 774 |
| 156 | York University Heights (27) | 2021 | 562 |
| 125 | St Lawrence-East Bayfront-The Islands | 2021 | 527 |
| NEIGHBOURHOOD_158 | OCC_YEAR | Total_Count | |
|---|---|---|---|
| 78 | Lambton Baby Point (114) | 2021 | 42 |
| 56 | Guildwood (140) | 2021 | 48 |
| 150 | Woodbine-Lumsden (60) | 2021 | 51 |
| 113 | Princess-Rosethorn (10) | 2021 | 54 |
| 88 | Markland Wood (12) | 2021 | 58 |
Crime Statistics for 2022¶
| NEIGHBOURHOOD_158 | OCC_YEAR | Total_Count | |
|---|---|---|---|
| 139 | West Humber-Clairville (1) | 2022 | 1146 |
| 93 | Moss Park (73) | 2022 | 702 |
| 156 | York University Heights (27) | 2022 | 694 |
| 36 | Downtown Yonge East (168) | 2022 | 690 |
| 152 | Yonge-Bay Corridor (170) | 2022 | 604 |
| NEIGHBOURHOOD_158 | OCC_YEAR | Total_Count | |
|---|---|---|---|
| 56 | Guildwood (140) | 2022 | 53 |
| 9 | Bayview Woods-Steeles (49) | 2022 | 59 |
| 150 | Woodbine-Lumsden (60) | 2022 | 67 |
| 4 | Avondale (153) | 2022 | 70 |
| 64 | Humber Heights-Westmount (8) | 2022 | 73 |
Crime Statistics for 2023¶
| NEIGHBOURHOOD_158 | OCC_YEAR | Total_Count | |
|---|---|---|---|
| 139 | West Humber-Clairville (1) | 2023 | 1371 |
| 156 | York University Heights (27) | 2023 | 847 |
| 36 | Downtown Yonge East (168) | 2023 | 790 |
| 93 | Moss Park (73) | 2023 | 770 |
| 152 | Yonge-Bay Corridor (170) | 2023 | 717 |
| NEIGHBOURHOOD_158 | OCC_YEAR | Total_Count | |
|---|---|---|---|
| 150 | Woodbine-Lumsden (60) | 2023 | 58 |
| 78 | Lambton Baby Point (114) | 2023 | 68 |
| 56 | Guildwood (140) | 2023 | 83 |
| 64 | Humber Heights-Westmount (8) | 2023 | 83 |
| 88 | Markland Wood (12) | 2023 | 96 |
Comparison of Actual Data and the Predicted Data of 2024¶
<ipython-input-163-525f6f4de7e7>:6: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy predicted_crime_2024.loc[: , 'Predicted_Total_Count'] = y_voting_TC_pred
| NEIGHBOURHOOD_158 | HOOD_158 | OCC_YEAR | OCC_MONTH | Total_Count | Predicted_Total_Count | |
|---|---|---|---|---|---|---|
| 36 | West Humber-Clairville (1) | 1 | 2024 | 1 | 110 | 113.0 |
| 37 | West Humber-Clairville (1) | 1 | 2024 | 2 | 101 | 108.0 |
| 38 | West Humber-Clairville (1) | 1 | 2024 | 3 | 79 | 110.0 |
| 39 | West Humber-Clairville (1) | 1 | 2024 | 4 | 93 | 110.0 |
| 40 | West Humber-Clairville (1) | 1 | 2024 | 5 | 104 | 110.0 |
| ... | ... | ... | ... | ... | ... | ... |
| 6630 | South Eglinton-Davisville (174) | 174 | 2024 | 2 | 15 | 11.0 |
| 6631 | South Eglinton-Davisville (174) | 174 | 2024 | 3 | 7 | 14.0 |
| 6632 | South Eglinton-Davisville (174) | 174 | 2024 | 4 | 17 | 13.0 |
| 6633 | South Eglinton-Davisville (174) | 174 | 2024 | 5 | 12 | 13.0 |
| 6634 | South Eglinton-Davisville (174) | 174 | 2024 | 6 | 13 | 12.0 |
948 rows × 6 columns
Overview: This chart compares the actual and predicted crime counts for each of the first six months of 2024.
OCC_MONTH Total_Count Predicted_Total_Count 0 1 3518 3189.0 1 2 3185 3008.0 2 3 3207 3340.0 3 4 3201 3406.0 4 5 3471 3732.0 5 6 3101 3676.0
- Key Observations:
- The predictions closely match the actual data for most months, though some months (e.g., January and May) have slightly higher predicted values than actual.
Visualizations based on the Predictions¶
array([1, 2, 3, 4, 5, 6])
| NEIGHBOURHOOD_158 | HOOD_158 | OCC_YEAR | OCC_MONTH | |
|---|---|---|---|---|
| 0 | West Humber-Clairville (1) | 1 | 2024 | 7 |
| 1 | West Humber-Clairville (1) | 1 | 2024 | 8 |
| 2 | West Humber-Clairville (1) | 1 | 2024 | 9 |
| 3 | West Humber-Clairville (1) | 1 | 2024 | 10 |
| 4 | West Humber-Clairville (1) | 1 | 2024 | 11 |
| ... | ... | ... | ... | ... |
| 6630 | South Eglinton-Davisville (174) | 174 | 2027 | 8 |
| 6631 | South Eglinton-Davisville (174) | 174 | 2027 | 9 |
| 6632 | South Eglinton-Davisville (174) | 174 | 2027 | 10 |
| 6633 | South Eglinton-Davisville (174) | 174 | 2027 | 11 |
| 6634 | South Eglinton-Davisville (174) | 174 | 2027 | 12 |
6635 rows × 4 columns
| NEIGHBOURHOOD_158 | HOOD_158 | OCC_YEAR | OCC_MONTH | Total_Counts | |
|---|---|---|---|---|---|
| 0 | West Humber-Clairville (1) | 1 | 2024 | 7 | 111 |
| 1 | West Humber-Clairville (1) | 1 | 2024 | 8 | 111 |
| 2 | West Humber-Clairville (1) | 1 | 2024 | 9 | 108 |
| 3 | West Humber-Clairville (1) | 1 | 2024 | 10 | 110 |
| 4 | West Humber-Clairville (1) | 1 | 2024 | 11 | 107 |
| ... | ... | ... | ... | ... | ... |
| 6630 | South Eglinton-Davisville (174) | 174 | 2027 | 8 | 12 |
| 6631 | South Eglinton-Davisville (174) | 174 | 2027 | 9 | 13 |
| 6632 | South Eglinton-Davisville (174) | 174 | 2027 | 10 | 14 |
| 6633 | South Eglinton-Davisville (174) | 174 | 2027 | 11 | 14 |
| 6634 | South Eglinton-Davisville (174) | 174 | 2027 | 12 | 15 |
6635 rows × 5 columns
Anticipated Crime Statistics for next six months of 2024¶
Overview: This graph depicts the predicted crime trend for the last half of 2024 (July to December).
Key Observations:
- Crime activity increases from July to August and November but decreases significantly in September and December.
- The fluctuation in crime patterns can be explained by various factors such as seasonal changes, public events, or social patterns.
Anticipated Total count of Crime Acitivities for upcoming three years¶
| OCC_YEAR | Total_Counts | |
|---|---|---|
| 0 | 2025 | 42175 |
| 1 | 2026 | 42175 |
| 2 | 2027 | 42167 |
Anticipated Crime Statistics for Upcoming Years with a Month Breakdown¶
| OCC_YEAR | OCC_MONTH | Total_Counts | |
|---|---|---|---|
| 0 | 2025 | 1 | 3189 |
| 1 | 2025 | 2 | 3008 |
| 2 | 2025 | 3 | 3340 |
| 3 | 2025 | 4 | 3406 |
| 4 | 2025 | 5 | 3732 |
| 5 | 2025 | 6 | 3676 |
| 6 | 2025 | 7 | 3659 |
| 7 | 2025 | 8 | 3711 |
| 8 | 2025 | 9 | 3575 |
| 9 | 2025 | 10 | 3632 |
| 10 | 2025 | 11 | 3685 |
| 11 | 2025 | 12 | 3562 |
| 12 | 2026 | 1 | 3189 |
| 13 | 2026 | 2 | 3008 |
| 14 | 2026 | 3 | 3340 |
| 15 | 2026 | 4 | 3406 |
| 16 | 2026 | 5 | 3732 |
| 17 | 2026 | 6 | 3676 |
| 18 | 2026 | 7 | 3659 |
| 19 | 2026 | 8 | 3711 |
| 20 | 2026 | 9 | 3575 |
| 21 | 2026 | 10 | 3632 |
| 22 | 2026 | 11 | 3685 |
| 23 | 2026 | 12 | 3562 |
| 24 | 2027 | 1 | 3189 |
| 25 | 2027 | 2 | 3008 |
| 26 | 2027 | 3 | 3332 |
| 27 | 2027 | 4 | 3406 |
| 28 | 2027 | 5 | 3732 |
| 29 | 2027 | 6 | 3676 |
| 30 | 2027 | 7 | 3659 |
| 31 | 2027 | 8 | 3711 |
| 32 | 2027 | 9 | 3575 |
| 33 | 2027 | 10 | 3632 |
| 34 | 2027 | 11 | 3685 |
| 35 | 2027 | 12 | 3562 |
Anticipated Crime Statistics for 2025¶
| NEIGHBOURHOOD_158 | OCC_YEAR | Total_Counts | |
|---|---|---|---|
| 139 | West Humber-Clairville (1) | 2025 | 1322 |
| 156 | York University Heights (27) | 2025 | 789 |
| 36 | Downtown Yonge East (168) | 2025 | 760 |
| 93 | Moss Park (73) | 2025 | 747 |
| 152 | Yonge-Bay Corridor (170) | 2025 | 679 |
| NEIGHBOURHOOD_158 | OCC_YEAR | Total_Counts | |
|---|---|---|---|
| 150 | Woodbine-Lumsden (60) | 2025 | 93 |
| 78 | Lambton Baby Point (114) | 2025 | 95 |
| 64 | Humber Heights-Westmount (8) | 2025 | 113 |
| 107 | Old East York (58) | 2025 | 113 |
| 19 | Broadview North (57) | 2025 | 115 |
Anticipated Crime Statistics for 2026¶
| NEIGHBOURHOOD_158 | OCC_YEAR | Total_Counts | |
|---|---|---|---|
| 139 | West Humber-Clairville (1) | 2026 | 1322 |
| 156 | York University Heights (27) | 2026 | 789 |
| 36 | Downtown Yonge East (168) | 2026 | 760 |
| 93 | Moss Park (73) | 2026 | 747 |
| 152 | Yonge-Bay Corridor (170) | 2026 | 679 |
| NEIGHBOURHOOD_158 | OCC_YEAR | Total_Counts | |
|---|---|---|---|
| 150 | Woodbine-Lumsden (60) | 2026 | 93 |
| 78 | Lambton Baby Point (114) | 2026 | 95 |
| 64 | Humber Heights-Westmount (8) | 2026 | 113 |
| 107 | Old East York (58) | 2026 | 113 |
| 19 | Broadview North (57) | 2026 | 115 |
Anticipated Crime Statistics for 2027¶
| NEIGHBOURHOOD_158 | OCC_YEAR | Total_Counts | |
|---|---|---|---|
| 139 | West Humber-Clairville (1) | 2027 | 1322 |
| 156 | York University Heights (27) | 2027 | 789 |
| 36 | Downtown Yonge East (168) | 2027 | 760 |
| 93 | Moss Park (73) | 2027 | 747 |
| 152 | Yonge-Bay Corridor (170) | 2027 | 679 |
| NEIGHBOURHOOD_158 | OCC_YEAR | Total_Counts | |
|---|---|---|---|
| 150 | Woodbine-Lumsden (60) | 2027 | 85 |
| 78 | Lambton Baby Point (114) | 2027 | 95 |
| 64 | Humber Heights-Westmount (8) | 2027 | 113 |
| 107 | Old East York (58) | 2027 | 113 |
| 19 | Broadview North (57) | 2027 | 115 |
Summary and Conclusion¶
In conclusion, our project provided valuable insights into Toronto’s crime trends over the past few years, highlighting seasonal patterns, neighborhood-specific crime rates, and the challenges of predictive modelling with real-world data. We hope these insights can help us better understand crime patterns and improve safety in the city.